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DETAILED ACTION 



Response to Amendment 



1 . In response to the office action from 12/13/2004, the applicant has submitted an 
amendment, filed 3/14/2005, amending claims 1 and 11, while arguing to traverse the art 
rejection based on the limitation regarding stored speech files (Amendment, Page 8). The 
applicant's arguments have been fully considered but are moot with respect to the new grounds 
of rejection in view of Rhie (U.S. Patent: 5,953,392) and Lumelsky (U.S. Patent: 6,081,780). 



Claim Rejections - 35 USC §103 



2. The following is a quotation of 35 U.S.C. 103(a) which forms the basis for all 
obviousness rejections set forth in this Office action: 

(a) A patent may not be obtained though the invention is not identically disclosed or described as set forth in 
section 102 of this title, if the differences between the subject matter sought to be patented and the prior art are 
such that the subject matter as a whole would have been obvious at the time the invention was made to a person 
having ordinary skill in the art to which said subject matter pertains. Patentability shall not be negatived by the 
manner in which the invention was made. 

3. Claims 1 and 6 are rejected under 35 U.S.C. 103(a) as being unpatentable over 
Miyashita et al (U.S. Patent: 6,289,085) in view of Rhie et al (U.S. Patent: 5,953,392). 

With respect to Claim 1, Miyashita discloses: 

(a) Storing text files in a database at the remote location (electronic mail database, Col. 



16, Lines 52-59); 
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(b) Converting, at the remote location, the text files stored in step (a) into speech files 
(Col. 17, Lines 4-8); 

(c) Receiving a request for a portion of the speech files converted in step (b) (requested 
reading of an email, Col. 17, Lines 40-55); 

(d) Transmitting to the information appliance the portion of the speech files requested in 
step (c) (Col. 17, Lines 9-22); and 

(e) Receiving and presenting the speech files transmitted in step (d) through audio 
speakers (telephone output of a speech signal, Col. 18, Lines 3-5). 

Although Miyashita teaches the conversion of a text file to speech, Miyashita does not 
teach a means for storing the converted files for playback upon a user request, however Rhie 
teaches a means for storing text-to-speech converted files for playback upon a user request 
(CMSI, Col. 3, Line 66- Col. 4, Line 20; and storing generated voice file, Col. 5, Lines 52-67). 

Miyashita and Rhie are analogous art because they are from a similar field of endeavor in 
text-to-speech conversion. Thus, it would have been obvious to a person of ordinary skill in the 
art, at the time of invention, to modify the teachings of Miyashita with the means for storing a 
synthesized speech file for playback to a user as recited by Rhie in order to provide further 
telephone voice services to a user and more efficient text processing by avoiding the need for any 
unnecessary text-to-speech processing through the retrieval of previously synthesized speech 
(Rhie, Col. 2, Lines 36-40; and Col. 5, Lines 52-67). 

With respect to Claim 6, Miyashita recites: 
Receiving a selection of one of multiple voice personalities, and converting the text files into 
speech files using the selected voice personality (Col. 7, Lines 37-41). 
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4. Claims 2-4, 11, 13, and 14 are rejected under 35 U.S.C. 103(a) as being unpatentable 
over Miyashita et al in view of Rhie et al, and further in view of Hong et al (U.S. Patent: 
5,737,030). 

With respect to Claim 2, Miyashita in view of Rhie teaches the method for performing 
text-to-speech conversion at a server and transmitting the converted speech to a terminal device, 
as applied to Claim 1 . Miyashita in view of Rhie does not specifically suggest method use in an 
EPG application, however Hong discloses: 

Step (e) includes receiving and presenting speech files of one of electronic program guide 
(EPG) information, weather information and news information (providing an audio 
representation of program guide information, Col. 7, Lines 1-16). 

Miyashita, Rhie, and Hong are analogous art because they are from a similar field of 
endeavor in audio signal processing. Thus, it would have been obvious to a person of ordinary 
skill in the art, at the time of invention, to modify the teachings of Miyashita in view of Rhie 
with the method of providing an audio representation of EPG data as taught by Hong to provide 
illiterate or vision impaired individuals with a means of accessing television program 
information (Hong, Col. 2, Lines 40-43). 

With respect to Claim 3, Miyashita in view of Rhie teaches the method and 
corresponding steps for performing text-to-speech conversion at a server and transmitting the 
converted speech information to a terminal device upon a user request, as applied to Claim 1, 
while Hong teaches the use of speech synthesis in an EPG application as applied to Claim 2. 
Miyashita does not teach the additional steps of receiving a page location indication and 
transmitting speech data based upon the location, however Hong recites: 
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(f) Receiving an indication of a location on the page of text (position information and 
cursor, Col. 4, Line 55- Col. 5, Line 14, Col. 6, Line 40- Col. 7, Line 16, and Fig. 5); and 

(g) Transmitting a portion of the EPG speech files corresponding to the received location 
indication (audio information corresponding to a program highlighted by a cursor, Col. 4, Line 
55- Col. 5, Line 14, Col. 6, Line 40- Col. 7, Line 16, and Fig. 5). 

Hong also discloses the ability to display EPG text as per Fig. 5. 

Miyashita, Rhie, and Hong are analogous art because they are from a similar field of 
endeavor in audio signal processing. Thus, it would have been obvious to a person of ordinary 
skill in the art, at the time of invention, to modify the teachings of Miyashita in view of Rhie 
with the steps of receiving a page location indication and receiving speech data based upon the 
location as taught by Hong in order to provide an illiterate or vision impaired individual with 
program specific audio information (Hong, Col. 2, Lines 40-43). 

With respect to Claim 4, Hong additionally discloses: 

(f) Includes receiving an indication of a location in the grid; and step (g) includes first 
transmitting speech files of the at least one date, multiple channels and multiple times and then 
separately transmitting speech files of the legend in the grid location indicated in step (f) (cursor, 
date, channel, and time, Fig. 5, and Col. 4, Line 55- Col. 5, Line 14, Col. 6, Line 40- Col. 7, Line 
16). 

Miyashita, Rhie, and Hong are analogous art because they are from a similar field of 
endeavor in audio signal processing. Thus, it would have been obvious to a person of ordinary 
skill in the art, at the time of invention, to modify the teachings of Miyashita in view of Rhie 
with the means for receiving an indication of a grid location and separately transmitting speech 
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files corresponding to each grid location as taught by Hong in order to allow an illiterate or 
visually impaired user with grid information through a comprehendible audio means (Hong, Col. 
2, Lines 40-43). 

With respect to Claim 11, Miyashita discloses: 

Storing text files in a database at the remote location (electronic mail database. Col. 16, 
Lines 52-59); 

Converting, at the remote location, the text files stored in step (a) into audio files (Col. 
17, Lines 4-8); 

Receiving a request for a portion of the speech files converted in step (b) (requested 
reading of an email, Col. 17, Lines 40-55); 

(Transmitting to the information appliance the portion of the audio files requested in step 
(c) (Col. 17, Lines 9-22); and 

Receiving and presenting the speech files transmitted in step (d) through audio speakers 
(telephone output of a speech signal, Col. 18, Lines 3-5). 

Although Miyashita teaches the conversion of a text file to speech, Miyashita does not 
teach a means for storing the converted files for playback upon a user request, however Rhie 
teaches a means for storing text-to-speech converted files for playback upon a user request 
(CMSI, Col. 3, Line 66- Col. 4, Line 20; and storing generated voice file, Col. 5, Lines 52-67). 

Miyashita and Rhie are analogous art because they are from a similar field of endeavor in 
text-to-speech conversion. Thus, it would have been obvious to a person of ordinary skill in the 
art, at the time of invention, to modify the teachings of Miyashita with the means for storing a 
synthesized speech file for playback to a user as recited by Rhie in order to provide further 
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telephone voice services to a user and more efficient text processing by avoiding the need for any 
unnecessary text-to-speech processing through the retrieval of previously synthesized speech 
(Rhie, Col. 2, Lines 36-40; and Col. 5, Lines 52-67). 

Although Miyashita in view of Rhie teaches a system featuring similar functionality to 
the presently claimed invention, Miyashita in view of Rhie does not specifically suggest method 
use in an EPG application, however Hong teaches providing an audio representation of program 
guide information (Col. 7, Lines 1-16). Hong also teaches the use of a set top box for receiving 
such EPG information (Col. 7, Lines 17-21). 

Miyashita, Rhie, and Hong are analogous art because they are from a similar field of 
endeavor in audio signal processing. Thus, it would have been obvious to a person of ordinary 
skill in the art, at the time of invention, to modify the teachings of Miyashita in view of Rhie 
with the method of providing an audio representation of EPG data as taught by Hong to provide 
illiterate or vision impaired individuals with a means of accessing television program 
information (Hong, Col. 2, Lines 40-43). 

With respect to Claim 13, Hong teaches the EPG speech data corresponding to a grid 
position as applied to Claim 4, and Miyashita, Rhie, and Hong are obvious in combination for 
the reasons given with respect to Claim 4. Also, it would be inherent that a speech file would be 
paused upon completing program information output and that additional program information 
supplied in response to a change in cursor position, since the audio EPG information is output 
upon changing a cursor position (Hong, Col. 4, Line 55- Col. 5, Line 14, Col. 6, Line 40- Col. 7, 
Line 16), thus providing the user with instant program information (Hong, Col. 7, Lines 29-35). 

With respect to Claim 14, Hong further discloses: 
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Selecting the channel for one of listening and viewing (Col. 4, Line 43). 

Miyashita, Rhie, and Hong are analogous art because they are from a similar field of 
endeavor in audio signal processing. Thus, it would have been obvious to a person of ordinary 
skill in the art, at the time of invention, to modify the teachings of Miyashita in view of Rhie 
with the means of selecting a channel for listening and viewing as taught by Hong, in order to 
allow an illiterate or visually impaired user to see a selected program in detail (Hong, Col. 7, 
Lines 23-35). 

5. Claim 5 is rejected under 35 U.S. C. 103(a) as being unpatentable over Miyashita et al in 
view of Rhie et al, and further in view of Oh (U.S. Patent: 6,141,642). 

With respect to Claim 5, Miyashita in view of Rhie teaches the method for performing 
text-to-speech conversion at a server and transmitting the converted speech to a terminal device, 
as applied to Claim 1. Although Miyashita further discloses performing the text-to-speech 
conversion for multiple languages (Col. 7, Lines 30-35), the use of separate synthesizers is not 
specifically suggested, however Oh shows: 

Converting the text files into speech files using a first text-to-speech (TTS) synthesizer 
and a second TTS synthesizer, whereby the first TTS synthesizer and the second TTS synthesizer 
use different languages (Fig. 2, Elements 212 and 214). 

Miyashita, Rhie, and Oh are analogous art because they are from a similar field of 
endeavor in speech synthesis. Thus, it would have been obvious to a person of ordinary skill in 
the art, at the time of invention, to modify the teachings of Miyashita in view of Rhie with the 
use of multiple TTS synthesizers corresponding to different language as taught by Oh in order to 
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provide text-to-speech synthesis for text that appears in multiple languages (Oh, Col. 1, Lines 49- 
52). 



6. Claims 7 and 10 is rejected under 35 U.S.C. 103(a) as being unpatentable over Miyashita 
et al in view of Rhie, and further in view of Lumelsky (U.S. Patent: 6,081,780). 

With respect to Claim 7, Miyashita in view of Rhie teaches the method for performing 
text-to-speech conversion at a server and transmitting the converted speech to a terminal device, 
as applied to Claim 1 . Miyashita in view of Rhie does not teach locally storing and extracting a 
synthesized speech file, however Lumelsky teaches such a process (Col. 10, Line 49- Col. 11, 
Line 10). 

Miyashita, Lumelsky, and Rhie are analogous art because they are from a similar field of 
endeavor in text-to-speech conversion. Thus, it would have been obvious to a person of ordinary 
skill in the art, at the time of invention, to modify the teachings of Miyashita in view of Rhie 
with the means for locally storing and extracting a synthesized speech file as taught by Lumelsky 
in order to provide a means for a user to locally control the playback rate of a speech file 
(Lumelsky, Col. 11, Lines 4-10). 

With respect to Claim 10, Lumelsky teaches the local storage means for synthesized 
speech and further teaches transmitting updated speech data each time a user accesses a server 
(Col. 18, Lines 39-57). 



7. Claim 8 is rejected under 35 U.S.C. 103(a) as being unpatentable over Miyashita et al in 
view of Rhie et al, and further in view of Houser et al (U.S. Patent: 5, 774,859). 
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With respect to Claim 8, Miyashita in view of Rhie teaches the method for performing 
text-to-speech conversion at a server and transmitting the converted speech to a terminal device, 
as applied to Claim 1 . Miyashita in view of Rhie does specifically suggest the use of an audio 
output buffer, however, the use of such a buffer is well-known in the audio processing art as is 
evidenced by Houser: 

Step (e) includes buffering received speech files in a buffer of the information appliance, 
and presenting the buffered speech files through the audio speakers (Col. 13, Lines 11-31). 

Miyashita, Rhie, and Houser are analogous art because they are from a similar field of 
endeavor in text-to-speech conversion. Thus, it would have been obvious to a person of ordinary 
skill in the art, at the time of invention, to modify the teachings of Miyashita with the use of an 
audio output buffer in order to provide temporary storage for necessary signal processing before 
an audio signal is sent to a speaker (Col. 13, Lines 11-31). 

8. Claim 9 is rejected under 35 U.S. C. 103(a) as being unpatentable over Miyashita et al in 
view of Rhie, and further in view of Cannon et al (U.S. Patent: 6,510,209). 

With respect to Claim 9, Miyashita in view of Rhie teaches the method for performing 
text-to-speech conversion at a server and transmitting the converted speech to a terminal device, 
as applied to Claim 1 . Miyashita in view of Rhie does not teach presenting set-up configuration 
prompts to a user and implementing a predetermined input time period after issuing such a 
prompt, however Cannon discloses: 
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(f) Presenting set-up configurations sequentially through the audio speaker (Fig. 4, 
Element 412); 

(g) Pausing the audio presented in step (f) between each set-up configuration (waiting a 
predetermined time period for an input command, Col. 6, Lines 4-15); and 

(h) Waiting a predetermined time period during each pause to receive an input command 
(waiting a predetermined time period for an input command, Col. 6, Lines 4-15). 

Miyashita, Rhie, and Cannon are analogous art because they are from a similar field of 
endeavor in network-enabled device control. Thus, it would have been obvious to a person of 
ordinary skill in the art, at the time of invention, to modify the teachings of Miyashita in view of 
Rhie with the use of set-up configuration prompts and a predetermined time period for inputting 
a configuration command in order to allow a user to conveniently configure a device "riihrmt 
from a remote location (Cannon, Col. 1, Line 66- Col. 2, Line 2) while only accepting commands 
for a predetermined time period to prevent an unintended input from being improperly 
recognized as a command. 

9. Claim 12 is rejected under 35 U.S. C. 103(a) as being unpatentable over Miyashita et al in 
view of Rhie et al, further in view of Hong et al, and further in view of Houser et al. 

With respect to Claim 12, Miyashita in view of Rhie, and further in view of Hong 
teaches the method for performing EPG text-to-speech conversion at a server and transmitting 
the converted EPG speech data to a terminal device, as applied to Claim 1 1 . Miyashita in view 
of Rhie, and further in view of Hong does not teach periodically transmitting EPG speech data, 
however Houser discloses: 
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Receiving the EPG audio data at periodic time intervals (periodically updating and 
storing EPG information, Col. 23, Lines 7-37, which includes phonemic data, Col. 29, Lines 23- 
49). 

Miyashita, Rhie, Hong, and Houser are analogous art because they are from a similar 
field of endeavor in audio signal processing. Thus, it would have been obvious to a person of 
ordinary skill in the art, at the time of invention, to modify the teachings of Miyashita in view of 
Rhie, and further in view of Hong with the means for periodically transmitting and storing of 
EPG speech files at a local device as taught by Houser in order to ensure that device EPG speech 
data is up-to-date and accurate (Houser, Col. 23, Lines 30-34). 

10. Claim 15 is rejected under 35 U.S.C. 103(a) as being unpatentable over Lumelsky in 
view of Houser et al. 

With respect to Claim 15, Lumelsky discloses: 

A memory device (Fig. 4, Element 313); 

A modem adapted to connect to a network (Fig. 4, Element 320); 

A processor coupled to the modem for communicating on the network, receiving speech 
files from the network, and storing the speech files in the memory device (Col. 19, Lines 30-52). 

A receiver for accepting input commands from a remote control (hands-free voice 
controls, Col. 21, Lines 5-62). 

An audio speaker (Fig. 4, Element 325); 
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The processor responsive to the input commands accepted by the receiver for extracting a 
portion of the speech files stored in the memory and sending the extracted portion of the speech 
files to the audio speaker (Col. 20, Line 13- Col. 21, Line 15). 

Although voice controls can be considered as a form of remote controls, Lumelsky does 
not specifically suggest a physical remote control device, however Houser discloses a physical 
remote control device for initiating speech recognition control commands and having keys as an 
alternate command entry means (Col. 19, Lines 5-26). 

Lumelsky and Houser are analogous art because they are from a similar field of endeavor 
in speech controlled media systems. Thus, it would have been obvious to a person of ordinary 
skill in the art, at the time of invention, to modify the teachings of Lumelsky with the physical 
remote control device as taught by Houser to enhance a speech recognition interface by 
providing a user with a command initiation indication after pressing a controller button so that a 
user is aware that a command entry system is active (Houser, Col. 19, Lines 5-26). 

11. Claims 16-19 and 21 are rejected under 35 U.S. C. 103(a) as being unpatentable over 
Lumelsky in view of Houser et al, and further in view of Hong et al. 
With respect to Claim 16, Lumelsky discloses: 

A server coupled to the network (authoring system server, Fig. 1, Element 101); 

Text file storage, TTS synthesizer, and a transmitter for transmitting files onto the 
network (text files on a computer and TTS, Col. 12, Lines 44-58; and data transmission to a 
network, Col. 7, Lines 3-25); 
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Lumelsky in view of Houser does not specifically suggest method use in an EPG 
application, however Hong teaches such a TTS application (providing an audio representation of 
program guide information, Col. 7, Lines 1-16). 

Lumelsky, Houser, and Hong are analogous art because they are from a similar field of 
endeavor in audio signal processing. Thus, it would have been obvious to a person of ordinary 
skill in the art, at the time of invention, to modify the teachings of Lumelsky in view of Houser 
with the method of providing an audio representation of EPG data as taught by Hong to provide 
illiterate or vision impaired individuals with a means of accessing television program 
information (Hong, Col. 2, Lines 40-43). 

With respect to Claim 17, Houser further recites: 

The processor receives the EPG speech files and the EPG text files from the network 
(periodically updating and storing EPG information, Col. 23, Lines 7-37, which includes 
phonemic data, Col. 29, Lines 23-49); 

The processor formats the EPG text files into a page of text; and the processor provides 
the page for display on the television monitor (Fig. 11); 

The receiver receiving an input command which provides an identifier for identifying a 
location on the page displayed on the television monitor (cursor position, Col. 25, Lines 52-64)p 

Houser does not specifically suggest providing audio program data based upon cursor 
position, however Hong teaches this limitation with respect to Claim 3. 

Lumelsky, Houser, and Hong are analogous art because they are from a similar field of 
endeavor in EPG data processing. Thus, it would have been obvious to a person of ordinary skill 
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in the ait, at the time of invention, to modify the teachings of Houser with the steps of receiving 
a page location indication and receiving speech data based upon the location as taught by Hong 
in order to provide an illiterate or vision impaired individual with program specific audio 
information (Hong, Col. 2, Lines 40-43). 

With respect to Claim 18, Hong additionally discloses the output of EPG speech data 
corresponding to grid position as applied to Claim 4. 

With respect to Claim 19, Hong teaches the EPG grid information acquisition means as 
applied to Claim 4, which downloads grid information, and more detailed program specific 
information separately, based upon cursor position. 

With respect to Claim 21, Lumelsky further discloses selecting a preferred speaker's 
voice {Col. 10, Line 49- Col. 11, Line 10). 

12. Claim 20 is rejected under 35 U.S.C. 103(a) as being unpatentable over Lumelsky in 
view of Houser et al, further in view of Hong, and further in view of Oh. 

Lumelsky in view of Houser et al, and further in view of Hong teaches the EPG speech 
synthesis system as applied to Claim 16, however none of the aforementioned references 
specifically teaches the use of separate synthesizers is not specifically suggested, however Oh 
shows: 

Converting the text files into speech files using a first text-to-speech (TTS) synthesizer 
and a second TTS synthesizer, whereby the first TTS synthesizer and the second TTS synthesizer 
use different languages (Fig. 2, Elements 212 and 214). 
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Lumelsky, Houser, Hong, and Oh, are analogous art because they are from a similar field 
of endeavor in audio processing. Thus, it would have been obvious to a person of ordinary skill 
in the art, at the time of invention, to modify the teachings of Lumelsky in view of Houser et al, 
and further in view of Hong with the use of multiple TTS synthesizers corresponding to different 
language as taught by Oh in order to provide text-to-speech synthesis for text that appears in 
multiple languages (Oh, Col. 1, Lines 49-52). 

Conclusion 

13. The prior art made of record and not relied upon is considered pertinent to applicant's 
disclosure: 

Matthews (U.S. Patent: 5,815,145)- teaches a means for playing a digitized audio file 
from an EPG database. 

Brown et al (U.S. Patent: 6,603,838)- teaches a voice messaging system for voice file 
retrieval and subsequent local storage. 

14. Any inquiry concerning this communication or earlier communications from the 
examiner should be directed to James S. Wozniak whose telephone number is (571) 272-7632 
and email is James.Wozniak@uspto.gov. The examiner can normally be reached on Mondays- 
Fridays, 8:30-4:30. 
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If attempts to reach the examiner by telephone are unsuccessful, the examiner's 
supervisor, Wayne Young can be reached at (571) 272-7582. The fax/phone number for the 
Technology Center 2600 where this application is assigned is (703) 872-9306. 

Any inquiry of a general nature or relating to the status of this application or proceeding 
should be directed to the technology center receptionist whose telephone number is (703) 306- 
0377. 

James S. Wozniak 
5/26/2005 



SUSAN MCFADDEN 
PRIMARY EXAMINER 




